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Abstract. We describe an experiment designed to evaluate the use of the "Cortex 
Transform" (Watson, 1987) as an image preprocessor for Sparse Distributed 
Memory. In the experiment, a set of images were injected with Gaussian Noise, 
preprocessed with the Cortex Transform, and then encoded into bit patterns. 

The various spatial frequency bands of the Cortex Transform were encoded 
separately so that they could be evaluated based on their ability to properly 
cluster patterns belonging to the same class. The results of this study indicate 
that by simply encoding the low-pass band of the Cortex Transform, a very 
suitable input representation for the SDM can be achieved. 
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Abstract. We describe an experiment designed to evaluate the use 
of the “Cortex Transform” (Watson, 1987) as an image preprocessor for 
Sparse Distributed Memory. In the experiment, a set of images were 
injected with Gaussian noise, preprocessed with the Cortex Transform, and 
then encoded into bit patterns. The various spatial frequency bands of the 
Cortex Transform were encoded separately so that they could be evaluated 
based on their ability to properly cluster patterns belonging to the sam* 
class. The results of this study indicate that by simply encoding the low- 
pass band of the Cortex Transform, a very suitable input representation for 
the SDM can be achieved. 


Sparse Distributed Memory (SDM), an associative memory described by Kanerva 
(1988), is well-suited to perform high-level object recognition tasks because of its ability to 
quickly classify patterns on the basis of incomplete or corrupted information. This ability 
would be especially useful for visual recognition tasks, where typically an object in a scene 
must be quickly identified despite the presence of noise and distortions in the imaging 
process, or variations in the shape of the object However, before SDM can be applied to 
visual object recognition problems, it is necessary to determine how raw images should be 
preprocessed and encoded in order to form a suitable input for the SDM. 

To determine how raw images should be processed, it is first necessary to consider 
what types of variations in the image may interfere with the proper classification of an 
object. In this case, since we are interested in applying SDM to the problem of recognizing 
2D shapes, we need to be concerned with such image variations as pixel noise, changes in 
contrast, line thickness, or even slight variations in the shape’s structure (e.g., hand-drawn 
characters). At this stage, however, we concern ourselves with the case of pixel noise only 
(i.e., an independent and identically distributed Gaussian process added to each image pixel 
value). 

Because the SDM uses the Hamming distance between two bit-patterns as a 
measure of their “closeness,” our goal in preprocessing and encoding the image is to 
develop a bit-string representation of the image such that two shapes belonging to the same 
class give rise to bit-strings that are close in Hamming distance. Conversely, shapes 
belonging to different classes should give rise to bit-strings that are well-separated in 
Hamming distance. In this paper, we examine how well the Cortex Transform (Watson, 


1987), serving as the preprocessor, accomplishes this goal for images that have been 
perturbed with pixel noise only. 

The Cortex Transform 

The Cortex Transform is described in detail by Watson (1987, 1988). Here we 
discuss only the important features that were used in the experiment 

The Cortex Transform subdivides the content of an image into different spatial- 
frequency bands by filtering the image with a set of oriented, bandpass filters as shown in 
Figure 1. This process converts a single image into multiple images, each of which contains 
a unique subset of the spatial frequencies present in the original image. When properly sub- 
sampled, these images can provide a very compact representation of the original image 
because their pixels have very litde correlation with one another. This property is not only 
highly desirable for image compression, but would also be useful in preprocessing images for 
SDM. This is because in the encoding process, we wish to maximize the information content 
of each pixel being encoded. If the pixels are uncorrelated to one another, then each pixel is 
“saying” the most it can about the content of the image. 

In general, the Cortex Transform produces enough output images to give a complete 
represention of all the different spatial frequency bands in an image, such as depicted in 
Figure 1. For our purposes though, we chose to use only a portion of the bands. These are 
two different band-pass filters subdivided into four (mentations each, and two different low- 
pass filters, as shown in Figure 2. Note that each set of bandpass filters results in a set of 
four images - one image for each orientation - while each of the low-pass filters results in 
only one image. All the filtered images were sub-sampled so as to reduce the number of 
pixels with only a negligible loss of information (see Watson, 1988). 

While the output images of the Cortex Transform contain both the magnitude and 
phase of spatial frequency components, we considered only the magnitude to be important. 
Our reason for this is that we were not interested in the exact location of spatial frequency 
components (which the phase would provide), but more 
imp ortantly, the extent to which they are present in the 
image and their approximate location. 


Encoding the Cortex Transform 

The magnitudes of the filtered images were 
encoded into bit-strings by first quantizing the pixel 
values of an image into five levels each and then using a 
4-bit thermometer-type code, as shown in Table 1, to 
encode each pixel. The quantization levels for an image 
were determined by finding the minimum and maximum 
pixel values in the image, and then setting the 
quantization thresholds at uniform intervals over the 
range from minimum to maximum. A set of four images 
(corresponding to four orientations) was similarly 
quantized, except that the minimum and maximum pixel 


pixel 

value 

bit-string 

0 

0000 

1 

0001 

2 

0011 

3 

0111 

4 

1111 


Table 1: Encoding quantized 
pixel values into bit-strings. 
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values were computed over the set of all four images, and the ensemble was quantized as a 
whole. 

A bit-string representation of the image was then formed by simply concatenating the 
4-bit codes for each pixel into one long bit-string. Thus, an NxN low-pass filtered image 
would result in a bit-string of length NxNx4, and four (oriented) NxN band-pass filtered 
images would result in a single bit-string of length NxNxl6. 

Experiments 

The set of shapes used in this experiment were the 26 capital letters of the alphabet, 
extracted from the Courier 24-point font on a SUN 3/60 workstation, as shown in Figure 3. 
Each image contained only two pixel values: 0 for off (white) and 255 for on (black). These 
images were then injected with noise by adding an i.i.d. Gaussian process to the pixel 
values. (Note: since the output of the noise injection process was a floating point image, no 
special action needed to be taken for pixel values lying outside the interval [0, 255].) This 
was done using standard deviations (o) of 40, 80, and 120. For each standard deviation, 20 
samples of noise were generated. Thus, each letter had a total of 60 different noisy 
instances (3 values of a x 20 instances/a). These images, in addition to the 26 original 
(non-noisy) images, were then processed with the Cortex Transform under the four filter 
arrangements shown in Figure 2. The filtered images were then sub-sampled and encoded 
4-bits/pixel as described above. Since the original images were padded in a 32x32 square, 
only the central regions of each sub-sampled, filtered image needed to be encoded, as 

shown in Figure 4. 

This entire process is illustrated in Figure 5. 

Once the images were encoded, the resulting bit-patterns were compared by 
Hamming distance to determine how well patterns of the same class were clustered 
together. This was done as follows: 

Assuming we have chosen a particular standard deviation of noise, o, and a particular 
filtering strategy (one of the four shown in Fig. 2), then we denote the j* noisy instance of 
pattern i as, 

P tj [i=1...26J=1..20], 

and we denote the original (non-noisy) instance of pattern i as, 

Pf [i=1...26], 

(P is a bit-pattern obtained by encoding a preprocessed image). 

Then, we denote the Hamming distance between an original (non-noisy) instance of 
pattern class k and the noisy instance of pattern class i as, 

‘kz-'V-Vu 

where I • I L1 denotes the LI norm (Hamming distance for bit-strings). These 
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distances were computed for all k,ij. The distances were then accumulated into two 
different histograms for each k: one histogram for comparisons between P k * and instances 

within class k (which we denote // Jn p, and another for comparisons between P k and 
instances from classes other than k (which we denote ^ p. Formally, this can be 
expressed as, 


H in k (d) = Z d ti j, (sum over i=*,y= 1...20), 
and 


H mt i„ k (d) = Z d H j, (sum over all i*k,j= 1...20). 

Then, by integrating each histogram up to some distance, D, we obtain a measure of 
the signal-to-noise ratio that would result when reading the SDM with Hamming radius D. 
That is. 


S in k (D) = Z/Zy, k (d), (sum over d=O..J>), 

and, 

S KIJn jfP) - 2 «*,,>_»(«. (sum over d=O..J3). 

If 5^ k (D) is less than 5^,, in k (D), this would indicate that there are more instances 

of class k than of any other class within D. In terms of reading from the SDM, then, this 
would mean that there is at least a possibility of recovering the correct data when reading 
the memory with P k * as the address and using Hamming radius D. Otherwise, the data 

corresponding to P k * would most certainly be overwhelmed by data from other classes. 

Thus, we use the function 

UifSin^DXS^intiD) 

P k (D) = 

0, otherwise, 

to denote whether class k “passes” (1) or “fails” (0) at Hamming distance D. Then, 
in order to get a global measure of performance at Hamming distance D, we average p k (D) 

over all classes: 


PerflD) = 1/26 I p k (D), (sum over *=0...26). 

This function provides us with a reasonable measure by which to judge the 
representation formed by each of the filter processes. 
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Results 


The function Perf{D) is plotted in Figures 6-11 for the different standard deviations of 
noise (a=40, 80, 120) and filtering strategies tested. Figures 6-8 plot the performance of the 
low-pass filters, and Figures 9-11 plot the perfonnance of the oriented, band-pass filters 
along with the performance of the raw image (i.e., no preprocessing) 

For nearly all cases, the lowest-pass filter (i.e., the filter with the lowest cutoff 
frequency) provided the best performance, yielding 96% “passes” at D=5 bits (Fig. 6). This 
filter has a cutoff of 0-0.125 cycles/pixel and compresses the original 17xi5 pixels to 5x4, 
yielding a 12-fold decrease in the number of pixels (and hence the number of bits in the SDM 
input bit-string). 

Note that with a high standard-deviation of noise (a=120), the lowest-pass band no 
longer provides the best performance (Fig. 8). The best performance is provided instead by 
the next lowest band (0-0.25 cycles/pixel). Also at this noise level, the performance of the 
highest-pass band filter (0.25-0.5 cycles/pixel) is worse than that of the raw image with no 
preprocessing (Fig. 11). However, in all other cases, preprocessing yields an improvement 
in the image representation. 

Conclusion 

Our goal in this experiment was to evaluate the Cortex Transform as a preprocessor 
for Sparse Distributed Memory. The results indicate that for 2-D shapes for which the 
image has been perturbed with pixel noise only, a dramatic improvement in the image 
representation may be obtained by encoding the low spatial-frequency bands of the Cortex 
Transform. 

It should be noted that this experiment was intended as an initial study to evaluate 
the use of multi-resolution or oriented filters with SDM. There are many further extensions 
to this work. One would be to compute the performance of the various filters for other image 
variations, such as tine-thickness, contrast, or small structural variations. Another 
possibility would be to examine other encoding strategies, such as the use of real numbers 
instead of bits. In this case, it may also be useful to investigate the effect of using an L2 
distance metric instead of LI . 

An important advantage of conducting tests such as described here is that it is 
possible to evaluate a representation by itself without getting involved with the 
implementation details of SDM. This allows one to focus on the special problems of the 
application domain - such as in this case, the variations that can take place in an image - 
before employing more powerful, higher-level machinery. 
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Figure 2: The Cortex Alters used in the experiment. There are two sets of oriented, band-pass filters an octave apart, and two dif- 
ferent low-pass filters an octave apart The original image contains 32x32 pixels; thus, each spatial-frequency “image” also contains 32x32 pix- 
els (in this case, spatial frequency coefficients). Note that each set of four band-pass filters will result in a set of four filtered images - one for 
each orientation - while each of the low-pass filters will result in only one image. 
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bandpass- 1 
and lowpass-1 
subsampled 2:1 


bandpass-2 
and lowpass-2 
subsampled 4:1 


Figure 4: Extracting the central regions from the sub-sampled, filtered im- 
ages. Since the original shape is 15x17 but padded into a 32x32 square, the only pixels in 
the output that need to be encoded are those that correspond to the central 15x17 region of the 
original image. These pixels are shown (shaded areas) for the sub-sampled, filtered images. 
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Figure 5: Process for generating bit-strings from the image. The original image is injected with noise at a = 40, 80, and 120. 
This is done 20 times for each value of a. The resulting noisy images, in addition to the original (non-noisy) image, are then filtered in four dif- 
ferent ways with the Cortex Transform. These images are then sub-sampled and encoded to form bit-strings. Thus, the process converts each of 
the 32x32 font images into 244 different bit strings ((60 noisy instances + 1 original) * 4 different filters). The set of bit-strings resulting 
from each Cortex filter can then be used in a clustering analysis to evaluate the filter's ability to form a suitable image representation for SDM 
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